CascadiaFest: Server JS Morning Part 1 Notes

A little sleepy this morning, but I’m certain the talks will help wake us up with the first part of CascadiaJS Server Day.

Jennifer Wong: I Think I Know What You’re Talking About, But I’m Not Sure

Sometimes when I”m conversation with other developers, it’s … what?

“There are two hard things in computer science. Cache evaluation and naming things.”

Our conversations can get really complicated. A lot of our jargon makes no sense.

Parameter vs. Argument

What’s the difference? is there one?

parameter – a measurable factor which helps to define a particular system; refer to variable as found in the function definition

argument – statements and reasoning in support of a proposition; actual parameter refers to the actual input passed

In Math there is a clear definition, but in computer science, the definition is flipped.

Parameters help define a function. Arguments are passed into a function.

Colloquially, we use them interchangeably.

Scope

the portion of source code in which a binding of a name with an entity applies

Had it since 1960 but still difficult to define.

Lexical vs. Dynamic Scope

Lexical Scope: portion of the source code; variable’s definition is resolved by searching its containing block or function, then moving outer containing block.
Dynamic Scope: portion of run time; calling function is searched then the function which called the function.

Recursion

refers to a method of defining functions in which the function being define is applied within its own definition…

behaviour definied by 1) simple base case, 2) set of rules that reduce all other cases toward the base case

a function that calls itself

Example: Factorials

Concatenation

a linking together, to link together

Acronyms

CRM vs. CMS vs. CDN
* Custom Relationship Manager (Salesforce)
* Content Management System (WordPress)
* Content Delivery Network (Amazon CloudFront)

SaaS vs. Sass
* Software as a Service
* Syntactically awesome stylesheets (CSS extension language)

DOM
* Document Object Model: structured representation of the document and defines a way that the structure can be accessed.

GUI & CLI
* GUI: Graphical User Interface
* CLI: Command-Line INterface

AJAX
* jQuery AJAX method option can be set to false -> SJAJ

Take Away

Really explain the concepts to people, especially when someone gives you a confused look.

Slides

Will Scott: Scanning the internet with Node.js

Can’t just map where the sites originates based on CDN.

Measuring the Internet’s Stars – What can we learn about internet availability from a single measurement machine?

DNS: My computer asks the DNS resolver to get the number. If it doesn’t know, then it will ask one of the nameservers. There are lots of services that will do this for you.

~8mill resolvers
~5000 ISPs
~169 Countries with >20 Resolvers

Scans to find DNS resolvers and then have them resolve domains to see what answers they give.

HTTP
* ~100 mill servers
* ~20,000 HTTP Proxies (~70 countries)

Going to try this from a bunch of directions, turns out to basically be as much trouble dealing with distributed system in the first place.

IPs

IPv4 address is 4 parts, 32-bit e.g. 192.168.1.1

When it seemed limitless, they would give class A, then B and C, but then realized what if someone wants more than 256? and realized it’s limited.

CIDR – how many bits prefix or have control, how much is owned by the organization e.g. 192.168.1.1. / 24

Geolocation tables e.g. MaxMind

Problems:
* might get multiple hits for the same IP
* might not get any response

There must be a definitive map. Technical incarnation using BGP. Many publish their BGP tables.

Problems: table ~2 GB, ~21 mill lines; need to find the more specific ones, because there is some overlap

Build a table for fast lookup, reasonable size, creation speed.

Process
1. Reading a 2 GB file, use stream (want an intro? stream-adventure)
2. Representation – tried doing parent/children, to made it into one object
3. Collapse: merge neighbours with same value, remove duplicates, merge adjacent empty space

Connections

Looking at the top 10,000 domains and looking against 8 mill open servers.

Problem: processing the ~400 GB response each week.

Set map with limitation on concurrent amount in stream.

Want to play with data? satellite.cs.washington.edu / npm install ip2country / progressbar-stream

Kevin Dela Rosa: Adding intelligence to your JS applications

There is a lot of data out there.

Machine Learning Concepts

subfield of AI that focuses on algorithms which can learn from and make predictions from data.

Can adapt to your users to provide more personalized experiences, recognize patterns, replace humans in monotonous tasks, separate signal from noise

Types:

classification
regression (number prediction)
clustering
other (e.g. density estimation, collaborative filtering)

Pipeline

Collect
Formulate task
Train algorithm with subset of data (and tune)
Evaluate learned predictor against unseen data
Deploy predictor and apply on new/live data

Example: how people perceive topic on social media

Features: representation of examples/input; for sentiment, typically bag of words, but tweets are really short, lot of jargon, abbreviation, hashtags; normalize by creating equivalence classes for query term, username, URL, multiple consecutive occurrences of letters in a word, use a special tokenizer (i.e. Twokenizing)

Algorithm: which one to choose? Hundreds to choose from. Start with your favourite classifier learner, simple and familiar with. Examples: Neural Network, Naive Bayes, Logistic Regression.

Evaluate based on accuracy, precision (fraction that are relevant), recall (fraction of relevant documents retrieved)

Deployment Considerations

may be sensitive to false positive/negative
not all models built the same, make system performance part of evaluation
retrain/complete feedback loop regularly

Another Important AI Task

Examples: Cat face detection

Resources

^ Data: Amazon Web Services Public Data Sets, Socrata Open Data, UC Irvine Machine Learning Repository, Reddit?
^ Cloud/Hosted End-to-End: Amazon Machine Learning, Azure ML, Dato, BigML, PredictionIO
* NPM Libraries: Natural Language Processing (stnford-corenlp, natural, openlp), etc.
* APIs: AlchemyAPI, echonest, wit.ai