Data Center Locations
Our primary data-centers are located in N. Virginia. All customer information is always stored and processed in the continental US: https://aws.amazon.com/about-aws/global-infrastructure/
Zero-Downtime Deploys
Deployments to our production environment (whether for bug fixes or for new feature enhancements) require zero downtime, so customers can continue using Narrative Science products and experience no lapse in functionality.
Business Continuity
Narrative Science leverages AWS to provide a high degree of availability and fault tolerance. Our Application Load Balancers (ALB) route traffic to a cluster of API servers located across multiple redundant Availability Zones (AZ). The computing assets that comprise the NS platform are scattered throughout multiple availability zones so that the system will remain available in the event of a failure of one or two data centers.
24x7x365 Monitoring and Protection
Narrative Science’s application performance and security is monitored 24x7x365. Our monitoring systems consolidate metrics across our infrastructure. On-call engineers are notified as soon as an alert triggers, and if the on-call engineer for some reason does not acknowledge the alert within a few minutes, our alert system automatically escalates up a triage tree which includes Senior Engineering management.
We leverage AWS CloudWatch for our central logging system. All instances are configured to send application, security, and system logs to CloudWatch. The ability to make changes to the central logging system is limited to a few members of the Operations team. Security logs are also automatically archived to a separate secure storage location for safekeeping. Logs are discarded after 60 days.
Security
We monitor all Common Vulnerabilities and Exposures (CVEs) for our environment. We perform vulnerability assessments on a quarterly basis, and we use scanning tools to detect security vulnerabilities.
On a quarterly basis, we perform a PEN test against our internal office network using tools that include Burp Proxy Suite and OpenVAS. This helps ensure we follow security best practices, while also ensuring we catch the latest security issues.
Each quarter, an audit of system and application security will be completed. The following checks are included in this process:
All user accounts and access level
Configuration compliance
Network vulnerability
Facility security and compliance
Capacity review
System upgrades
Narrative Science has implemented a security information and event monitoring (SIEM) security system to monitor Narrative Science systems for security-related events. Many of our systems and applications forward events to the SIEM, known as AlienVault USM (AV), which ingests logs from multiple AWS resources, including CloudWatch, storage resources (S3 logs), AWS CloudTrail information, and Load Balancers, as well as application and system logs. AV analyzes information from those sources and uses it to provide real-time analysis of security events across the Narrative Science environment, and then correlates events and performs threat detection, incident response, compliance management, and real-time threat intelligence across all systems that it monitors. It looks for patterns of behavior that may indicate a security-related event has occurred. Details of this are covered in the threat analysis section below. The threat information is then presented to the security team, represented in graphical form, which allows administrators to quickly assess the seriousness and scope of a threat and respond in an appropriate manner.
Threat Analysis
Our security-scanning systems follow a multi-phased threat analysis process to generate threat intelligence logic using information collected over 3,000,000 threat indicators every day, including malicious IP addresses and URLs, domain names, malware samples, and suspicious files. The information is gathered from a wide range of sources, including:
External threat vendors
Open-sourced high-interaction honeypots that we set up to capture the latest attacker techniques and tools
Community-contributed threat data in the form of OTX “pulses” USM and OSSIM users voluntarily contributing anonymized data
This information is analyzed to provide the following information:
Correlation directives – USM ships with well over 2,700 pre-defined rules that translate raw events into specific, actionable threat information
Network IDS signatures – detect the latest threats in your network
Host IDS signatures – detect the latest threats targeting your critical systems
Asset discovery signatures – identify the latest operating systems, applications, and devices
Vulnerability assessment signatures – find the latest vulnerabilities on your systems
Reporting modules – provide new ways of viewing data about your environment and satisfying auditor and management requests
Dynamic incident response templates – utilize customized guidance on how to respond to each alert
Newly supported data source plugins – expand your monitoring footprint by incorporating data from third-party tools
Escalations are sent to the security team based on information from the threat analysis described above. If events indicate a bad actor may have compromised a system, the events are escalated with a high criticality rating. These alerts are clearly highlighted in AlienVault (AV), our alerting system. Based on the criticality and severity ratings, they are escalated to the security team. Security alerts that are not serious are logged and listed in the AV interface but do not generate emails or alerts.
The Operations team monitors AV closely. Serious events are reviewed within one business day. Less severe alerts are reviewed weekly to determine if any response or corrective action is needed.
Incident Management and Response Procedures
We have a well-defined and tested oncall and incident management process that assists oncall technicians with the process of quickly triaging an issue and effectively escalating to the needed resources. Incident management procedures ensure that effective communication channels are set up with pre-established touch-points with management and business management. The following steps are part of every incident management process:
Incident Classification - Determines who should be involved in the troubleshooting/recovery process and what communications should be sent out. Use the table below to assist in classifying incidents.
Communication Guidelines - How should teams communicate, who should be involved, and what escalations need to be made (both internal and external). Establish a timeline for external and internal communication.
Determine Functional Impact/Scope - What is the impact to internal and external users, whether it is a potential security breach or impact on availability?
Recoverability & Eradication - Determine if recovery efforts should focus on fixing existing systems or if restoring/rebuilding new systems should be the focus. Once complete, fully verify things are working as expected. After things are functional, make sure systems are returned to their normal working state and all information is secure.
Lessons Learned - Follow our incident follow-up process to capture the root cause analysis and lessons learned phases to track follow-up items to completion.
Narrative Science has an Incident response form that is filled out for every incident that qualifies for the incident response plan. This form will be sent to the teams involved in an incident as well as to the Cloud Engineering manager for approval.