A little sleepy this morning, but I’m certain the talks will help wake us up with the first part of CascadiaJS Server Day.
Jennifer Wong: I Think I Know What You’re Talking About, But I’m Not Sure
Sometimes when I”m conversation with other developers, it’s … what?
“There are two hard things in computer science. Cache evaluation and naming things.”
Our conversations can get really complicated. A lot of our jargon makes no sense.
Parameter vs. Argument
What’s the difference? is there one?
parameter – a measurable factor which helps to define a particular system; refer to variable as found in the function definition
argument – statements and reasoning in support of a proposition; actual parameter refers to the actual input passed
In Math there is a clear definition, but in computer science, the definition is flipped.
Parameters help define a function. Arguments are passed into a function.
Colloquially, we use them interchangeably.
the portion of source code in which a binding of a name with an entity applies
Had it since 1960 but still difficult to define.
Lexical vs. Dynamic Scope
- Lexical Scope: portion of the source code; variable’s definition is resolved by searching its containing block or function, then moving outer containing block.
- Dynamic Scope: portion of run time; calling function is searched then the function which called the function.
refers to a method of defining functions in which the function being define is applied within its own definition…
behaviour definied by 1) simple base case, 2) set of rules that reduce all other cases toward the base case
a function that calls itself
a linking together, to link together
CRM vs. CMS vs. CDN
* Custom Relationship Manager (Salesforce)
* Content Management System (WordPress)
* Content Delivery Network (Amazon CloudFront)
SaaS vs. Sass
* Software as a Service
* Syntactically awesome stylesheets (CSS extension language)
* Document Object Model: structured representation of the document and defines a way that the structure can be accessed.
GUI & CLI
* GUI: Graphical User Interface
* CLI: Command-Line INterface
* jQuery AJAX method option can be set to false -> SJAJ
Really explain the concepts to people, especially when someone gives you a confused look.
Will Scott: Scanning the internet with Node.js
Can’t just map where the sites originates based on CDN.
Measuring the Internet’s Stars – What can we learn about internet availability from a single measurement machine?
DNS: My computer asks the DNS resolver to get the number. If it doesn’t know, then it will ask one of the nameservers. There are lots of services that will do this for you.
- ~8mill resolvers
- ~5000 ISPs
- ~169 Countries with >20 Resolvers
Scans to find DNS resolvers and then have them resolve domains to see what answers they give.
* ~100 mill servers
* ~20,000 HTTP Proxies (~70 countries)
Going to try this from a bunch of directions, turns out to basically be as much trouble dealing with distributed system in the first place.
IPv4 address is 4 parts, 32-bit e.g. 192.168.1.1
When it seemed limitless, they would give class A, then B and C, but then realized what if someone wants more than 256? and realized it’s limited.
CIDR – how many bits prefix or have control, how much is owned by the organization e.g. 192.168.1.1. / 24
Geolocation tables e.g. MaxMind
* might get multiple hits for the same IP
* might not get any response
There must be a definitive map. Technical incarnation using BGP. Many publish their BGP tables.
Problems: table ~2 GB, ~21 mill lines; need to find the more specific ones, because there is some overlap
Build a table for fast lookup, reasonable size, creation speed.
1. Reading a 2 GB file, use stream (want an intro? stream-adventure)
2. Representation – tried doing parent/children, to made it into one object
3. Collapse: merge neighbours with same value, remove duplicates, merge adjacent empty space
Looking at the top 10,000 domains and looking against 8 mill open servers.
Problem: processing the ~400 GB response each week.
Set map with limitation on concurrent amount in stream.
Want to play with data? satellite.cs.washington.edu / npm install ip2country / progressbar-stream
Kevin Dela Rosa: Adding intelligence to your JS applications
There is a lot of data out there.
Machine Learning Concepts
subfield of AI that focuses on algorithms which can learn from and make predictions from data.
Can adapt to your users to provide more personalized experiences, recognize patterns, replace humans in monotonous tasks, separate signal from noise
- regression (number prediction)
- other (e.g. density estimation, collaborative filtering)
- Formulate task
- Train algorithm with subset of data (and tune)
- Evaluate learned predictor against unseen data
- Deploy predictor and apply on new/live data
Example: how people perceive topic on social media
Features: representation of examples/input; for sentiment, typically bag of words, but tweets are really short, lot of jargon, abbreviation, hashtags; normalize by creating equivalence classes for query term, username, URL, multiple consecutive occurrences of letters in a word, use a special tokenizer (i.e. Twokenizing)
Algorithm: which one to choose? Hundreds to choose from. Start with your favourite classifier learner, simple and familiar with. Examples: Neural Network, Naive Bayes, Logistic Regression.
Evaluate based on accuracy, precision (fraction that are relevant), recall (fraction of relevant documents retrieved)
- may be sensitive to false positive/negative
- not all models built the same, make system performance part of evaluation
- retrain/complete feedback loop regularly
Another Important AI Task
Examples: Cat face detection
^ Data: Amazon Web Services Public Data Sets, Socrata Open Data, UC Irvine Machine Learning Repository, Reddit?
^ Cloud/Hosted End-to-End: Amazon Machine Learning, Azure ML, Dato, BigML, PredictionIO
* NPM Libraries: Natural Language Processing (stnford-corenlp, natural, openlp), etc.
* APIs: AlchemyAPI, echonest, wit.ai