L p -Sampling - Documents - docslide.us [PDF]

Jan 1, 2016 - Lp-Sampling David Woodruff IBM Almaden Joint work with Morteza Monemizadeh TU Dortmund; Given a stream of

2 downloads 15 Views 114KB Size

Recommend Stories

Jan 19, 2016 - Penelitian pada Koperasi Karyawan Sumber Terang Kupang. ... Penjualan Bersih Modal Kerja Bersih Kali Rasio Perputaran Piutang Digunakan untuk mengukur berapa lama penagihan piutang selama satu periode Penjualan Kredit Piutang Kali Rasi

l@(p)©[l~ ~(Q)

The greatest of richness is the richness of the soul. Prophet Muhammad (Peace be upon him)

P L Lincoln House

Ask yourself: Can discipline be learned? Next

P+L V11 N2

Ask yourself: If I could change one thing in my life, what would I change and why? Next

Budgeting, Forecasting, & P&L

Learning never exhausts the mind. Leonardo da Vinci

P. armeniaca L

You can never cross the ocean unless you have the courage to lose sight of the shore. Andrè Gide

[PDF] Sampling of Populations

Forget safety. Live where you fear to live. Destroy your reputation. Be notorious. Rumi

Mineral Batuan PDF - Documents [PDF]

Jul 24, 2015 - Mineral Batuan Pdf Macam-Macam Mineral & Batuan Batuan adalah semua bahan penyusun kerak bumi dan biasanya berupa agregat mineral-mineral yang telah mengeras, (Kosmono). Batuan menurut genesanya (asal batuan) dibagi menjadi batuan beku

Spell - Documents - docslide.com.br [PDF]

Aug 5, 2015 - ... psico-fÃsicas psicÃ³logo psicÃ³logos psicÃ³pata psicoanÃ¡lisis psicoanalÃtico psicoanalista psicodÃ©lico psicofÃsica psicolingÃ¼Ãstica psicolingÃ¼Ãsticas psicolÃ³gica psicolÃ³gico psicolÃ³gico,ca psicolÃ³gicos psicologÃa psico

Asty wahyu - Documents [PDF]

Apr 12, 2017 - 100406033 - WAHYU ARDHININGTIKA (1).pdf. SEJARAH PERKEMBANGAN ARSITEKTUR ROMAWI SEBELUM REVOLUSI INDUSTRI MAKALAH Disusun Oleh : WAHYU ARDHININGTIKA NIM. 100406033 UNIVERSITAS SUMATRA UTARA FAKULTAS TEKNIK JURUSANâ¦

Idea Transcript

(https://docslide.com.br/register.html)

(https://docslide.com.br/)

HOME (HTTPS://DOCSLIDE.COM.BR/) LEADERSHIP (HTTPS://DOCSLIDE.COM.BR/CATEGORY/LEADERSHIP-MANAGEMENT.HTML) TECHNOLOGY (HTTPS://DOCSLIDE.COM.BR/CATEGORY/TECHNOLOGY.HTML) EDUCATION (HTTPS://DOCSLIDE.COM.BR/CATEGORY/EDUCATION.HTML) MORE TOPICS (HTTPS://DOCSLIDE.COM.BR/CATEGORY.HTML)

Home (https://docslide.com.br/) / Documents (https://docslide.com.br/category/documents.html) / L p -Sampling (https://docslide.com.br/documents/l-p-sampling.html)

L p -Sampling Category

View

Download

Posted on

REPORT (HTTPS://DOCSLIDE.COM.BR/REPORT-COPYRIGHT/L-P-SAMPLING) Documents (https://docslide.com.br/category/documents.html) 22 0 01-JAN-2016

RECOMMENDED (https://docslide.com.br/documents/samplingalgorithms-for-l-2-regression-and-applicationsmichael-w-mahoney.html)

Sampling algorithms for l 2 regression and applications Michael W. Mahoney Yahoo Research http://www.cs.yale.edu/homes/mmahoney (Joint work with P. Drineas. (https://docslide.com.br/documents/samplin algorithms-for-l-2-regression-andapplications-michael-wmahoney.html) Documents

(https://docslide.com.br/category/documents.html) (https://docslide.com.br/documents/samplingalgorithms-and-core-sets-for-l-p-regression-andapplications-michael.html)

Sampling algorithms and core-sets for L p regression and applications Michael W. Mahoney Yahoo Research ( For more info, see: http://www.cs.yale.edu/homes/mmahoney. (https://docslide.com.br/documents/samplin algorithms-and-core-sets-for-l-pregression-and-applicationsmichael.html) Documents

(https://docslide.com.br/category/documents.html) (https://docslide.com.br/documents/auditsampling-yelifia-wulan-p.html)

Audit Sampling (Yelifia Wulan P.) (https://docslide.com.br/documents/auditsampling-yelifia-wulan-p.html) Documents

(https://docslide.com.br/category/documents.html) (https://docslide.com.br/business/unit-2-p-1sampling.html)

Unit 2 p 1 Sampling (https://docslide.com.br/business/unit2-p-1-sampling.html) Business

(https://docslide.com.br/category/business.html) (https://docslide.com.br/documents/probabilitysampling-types-of-probability-sampling-designsl-simple-random.html)

Probability Sampling. Types of Probability Sampling Designs l Simple random sampling l Stratified sampling l Systematic sampling l Cluster (area) sampling. (https://docslide.com.br/documents/probabi sampling-types-of-probabilitysampling-designs-l-simplerandom.html) Documents

(https://docslide.com.br/category/documents.html) (https://docslide.com.br/technology/teamprojectslide-p-l-p.html)

Team Projectslide P L P (https://docslide.com.br/technology/teamprojectslide-p-l-p.html) Technology

(https://docslide.com.br/category/technology.html) (https://docslide.com.br/documents/fonema-l-pm-558450462d3dd.html)

Fonema L (P, M) (https://docslide.com.br/documents/fonema l-p-m-558450462d3dd.html) Documents

(https://docslide.com.br/category/documents.html) (https://docslide.com.br/documents/fonema-l-pm.html)

Fonema L (P, M) (https://docslide.com.br/documents/fonema l-p-m.html) Documents

(https://docslide.com.br/category/documents.html) (https://docslide.com.br/documents/l-p-nutrisikurang.html)

L P Nutrisi Kurang (https://docslide.com.br/documents/lp-nutrisi-kurang.html) Documents

(https://docslide.com.br/category/documents.html) (https://docslide.com.br/documents/ejercicio-p-lentera.html)

Ejercicio p. l. Entera (https://docslide.com.br/documents/ejercicio p-l-entera.html) Documents

(https://docslide.com.br/category/documents.html) (https://docslide.com.br/documents/l-pdermatitis-kontak.html)

l p Dermatitis Kontak (https://docslide.com.br/documents/lp-dermatitis-kontak.html) Documents

(https://docslide.com.br/category/documents.html)

Lp-Sampling David Woodruff IBM Almaden Joint work with Morteza Monemizadeh TU Dortmund Given a stream of updates (i, a) to coordinates i of an n-dimensional vector x |a| < poly(n) a is an integer stream length < poly(n) Output i with probability |xi|p/Fp, where Fp = |x|pp = i=1n |xi|p Easy cases: p = 1 and updates all of the form (i, 1) for some i Solution: choose a random update in the stream, output the coordinate it updates [Alon, Matias, Szegedy] Generalizes to all positive updates p = 0 and there are no deletions Solution: min-wise hashing, hash all distinct coordinates

(https://docslide.com.br/documents/teste-l-ptextopoetico.html)

Teste l p Textopoetico (https://docslide.com.br/documents/testel-p-textopoetico.html) Documents

(https://docslide.com.br/category/documents.html)

as you see them, maintain the minimum hash and item [Broder, Charikar, Frieze, Mitzenmacher] [Indyk] [Cormode, Muthukrishnan] Our main result For every 0 · p · 2, there is an algorithm that with probability · n-100 fails, and otherwise outputs an I in [n]

(https://docslide.com.br/documents/l-p-v-07-012015.html)

Algorithm is 1-pass, poly(-1 log n)-space and update time, and also returns wi = (1 ± )|xj|p/Fp Generalizes to 1-pass n1-

L P V. 07-01-2015 (https://docslide.com.br/documents/lp-v-07-01-2015.html)

2/ppoly(-1 log n)-space for p > 2 “additive-error” samplers Pr[I = j] = |xj|p/Fp ± Fp given explicitly in [Jayram, W] implicitly

Documents

for which for all j in [n] Pr[I = j] = (1 ± )|xj|p/Fp Condition on every invocation succeeding in any poly(n)-time algorithm

(https://docslide.com.br/category/documents.html)

in [Andoni, DoBa, Indyk, W] Lp-sampling solves and unifies many well-studied streaming problems: Solves Sampling with Deletions: [Cormode, Muthukrishnan, Rozenbaum] want importance sampling with deletions: maintain a sample i with probability |xi|/|x|1 Set p = 1 in our theorem [Chaudhuri, Motwani, Narasayya] ask to sample from the result of a SQL operation, e.g., self-join Set p = 2 in our theorem [Frahling, Indyk, Sohler] study maintaining approximate range spaces and costs of Euclidean spanning trees They need and obtain a routine to sample a point from a set undergoing insertions and deletions Alternatively, set p = 0 in our theorem

(https://docslide.com.br/documents/l-pthalasemia.html)

L P Thalasemia (https://docslide.com.br/documents/lp-thalasemia.html) Documents

(https://docslide.com.br/category/documents.html)

Alternative solution to Heavy Hitters Problem for any Fp: Output all i for which |xi|p > Á Fp Do not output any i for which |xi|p < (Á/2) Fp Studied by Charikar, Chen, Cormode, Farach-Colton, Ganguly, Muthukrishnan, and many others Invoke our

(https://docslide.com.br/documents/p-v-l.html)

Solves Block Heavy Hitters: given an n x d matrix, return indices i of rows Ri with |Ri|pp > Á ¢ j |Rj|pp [Andoni, DoBa,

p + v = l (https://docslide.com.br/documents/pv-l.html)

Indyk] study the case p = 1 Used by [Andoni, Indyk, Kraughtgamer] for constructing a small-size sketch for the Ulam

Documents

algorithm O~(1/Á) times, use approximations to values Optimal up to poly(-1 log n) factors

metric under the edit distance Treat R as a big (nd)-dimensional vector Sample an entry (i, j) using our theorem for general p The probability a row i is sampled is |Ri|pp/ j |Rj|pp, so we can recover IDs of all the heavy rows. We do not use Cauchy random variables or Nisan’s pseudorandom generator, could be more practical than [ADI] Alternative Solution to Fk-Estimation for any k ¸ 2: Optimal up to poly(-1 log n) factors Reduction given by [Coppersmith, Kumar]: Take r = O(n1-2/k) L2-samples wi1, … , wir In parallel estimate F2, call it F2’ Output (F2’/r) * j wijk-2 Proof: second moment method First algorithm not to use Nisan’s pseudorandom generator

(https://docslide.com.br/category/documents.html) (https://docslide.com.br/documents/fonema-l-pmpdf.html)

Fonema L (P, M).pdf (https://docslide.com.br/documents/fonema l-p-mpdf.html) Documents

Solves Cascaded Moment Estimation: Given an n x d matrix A, Fk(Fp)(A) = j |Aj|pkp Problem initiated by [Cormode,

(https://docslide.com.br/category/documents.html)

Muthukrishnan] Show F2(F0)(A) uses O(n1/2) space if no deletions Ask about complexity for other k and p For any p in [0,2], gives O(n1-1/k) space for Fk(Fp)(A) We get entry (i, j) with probability |Ai, j|p/ i’, j’ |Ai’, j’|p Probability row Ai is returned is Fp(Ai)/ j Fp(Aj) If 2 passes allowed, take O(n1-1/k) samples Ai, in 1st pass, compute Fp(Ai) in 2nd pass, and feed into Fk AMS estimator To get 1 pass, feed row IDs into an O(n1-1/k)-space algorithm of [Jayram, W] for estimating Fk based only on item IDs Algorithm is space-optimal [Jayram, W] Our theorem with p = 0 gives O(n1/2) space for F2(F0)(A) with deletions Ok, so how does it work? General Framework [Indyk, W] St = {i | |xi| in [t-1, t)} for = 1 + £()

St contributes if |St|pt ¸ ³ Fp(x), where ³ = poly(/log n)

Streamj is stream restricted to updates (i, c) with h(i) · n/2j Suppose 2j ¼ |St|. Then Streamj contains about 1 item of St Fp(Streamj) ¼ Fp(x)/2j |St| pt ¸ ³ Fp(x) means pt ¸ ³ Fp(Streamj) Can find the item in St in Streamj with Fp-heavy hitters algorithm Repeat the sampling poly(-1log n) times, count number of times there was an item in Streamj from St Use this to estimate sizes of contributing St, and Fp(x) ¼ t |St|pt 1. Form streams by subsampling 2. Run Heavy hitters algorithm on streams 3. Use heavy hitters to estimate contributing St Additive Error Sampler [Jayram, W] For contributing St, we also get poly(-1log n) items from the heavy hitters routine If the sub-sampling is sufficiently random (Nisan’s generator, min-wise independent), these items are random in St Since we have (1 ± )-approximations s’t to all contributing St, can: Choose a contributing t with probability s’tpt/t’ s’t’pt Output a random heavy hitter found in St For item i in contributing St, Pr[i output] =[s’tpt/t’ s’t’pt] ¢ 1/|St| = (1 ± )|xi|p/Fp

For item i in non-

contributing St, Pr[i output] = 0 Relative Error in Words Force all classes to contribute Inject additional coordinates in each class whose purpose is to make every class contribute Inject just enough so that overall, Fp does not change by more than a (1+)-factor Run [Jayram, W]sampling on resulting vector If the item sampled is an injected coordinate, forget about it Repeat many times in parallel and take the first repetition that is not an injected coordinate Since injected coordinates only contribute O() to Fp mass, small # of repetitions suffice Some Minor Points Before seeing the stream, we don’t know which classes contribute, so we inject coordinates into every class For St = {i | |xi| in [t-1, t)}, inject £(Fp/(pt # classes)) coordinates, where # classes = O(-1log n) Need to know Fp just guess it, verify at end of stream For some classes, £(Fp/(pt # classes)) < 1, e.g. if t is very large, so we can’t inject any new coordinates Find all elements in these classes and (1 ± )-approximations to their frequencies separately using a heavy hitters algorithm When sampling, either choose a heavy hitter with the appropriate probability, or select from contributing sets using [Jayram, W] There is a Problem The [Jayram, W]-sampler fails with probability ¸ poly(/log n), in which case it can output any item This is due to some of the subroutines of [Indyk, W] that it relies on, which only succeed with this probability So the large poly(/log n) additive error is still there Cannot repeat [Jayram, W] multiple times for amplification, since we get a collection of samples, and no obvious way of detecting failure On the other hand, could just repeat [Indyk, W] and take the median for the simpler Fk-estimation problem Our solution: Dig into the guts of the [Indyk, W] algorithm Amplify success probability to ¸ 1 – n-100 of subroutines A Technical Point About [Indyk, W] In [Indyk, W], Create log n substreams Streamj, where Streamj includes each coordinate independently with probability 2-j Can find the items in contributing St in Streamj with Fp-heavy hitters Repeat the sampling poly(-1log n) times, observe the fraction there is an item in Streamj from St Can use [Indyk, W] to estimate every |St| since every class contributes Issue of misclassification St = {i | |xi| in [t-1, t)}, and Fp-heavy hitters algorithm only reports approximate frequencies of items i it finds If |xi| = t, it may be classified into St or St+1 – it doesn’t matter Simpler solution than in [Indyk, W] If item misclassified, just classify it consistently if we see it again Equivalent to sampling from x’ with |x’|p = (1 ± )|x|p Can ensure with probability ¸ 1-n-100, we obtain st’ = (1 ± )|St| for all t A Technical Point About [Jayram, W] Since we have st’ = (1 ± )|St| for all t Choose a class t with probability s’tpt/t’ s’t’pt Output a random heavy hitter found in St How do we output a random item in St ? Min-wise independent hash function h For each i in St, h(i) = minj in St h(j) with probability (1 ± )/|St| h can be an O(log 1/)-wise independent hash function We recover i* in St for which h(i*) is minimum Compatible with sub-sampling, where Streamj is items i for which h(i) · n/2j Our goal is to recover i* with probability ¸ 1-n-100 We have st’, and look at the level j* where |St|/2j* = £(log n) If h is O(log n)wise independent, then with probability ¸ 1-n-100, i* is in Streamj* A worry: maybe Fp(Streamj*) >> Fp(x)/2j* so Heavy Hitter algorithm doesn’t work Can be resolved with enough independent repetitions Beyond the Moraines: Sampling Records Given an n x d matrix M of rows M1, …, Mn, sample i with probability |Mi|X/j |Mj|X, where X is a norm If i sampled, return a vector v for which |v|X = (1 ± )|Mi|X Applications Estimating planar EMD [Andoni, DoBa, Indyk, W] Sampling records in a relational database Define classes St = {i | |Mi|X in [t-1, t)} for = 1 + £() If we have a heavy hitters algorithm for rows of a matrix, then we can apply a similar approach as before Space should be d ¢poly(-1log n) Heavy Hitters for Rows Algorithm in [Andoni, DoBa, Indyk, W] Partition rows into B buckets In each bucket maintain the vector sum of rows hashed to it If |Mi|X > j |Mj|X, and if v is the vector in the bucket containing Mi, by the triangle inequality |v|X < |Mi|X + |Noise|X ¼ |Mi|X + j |Mj|X/B |v|X > |Mi|X - |Noise|X ¼ |Mi|X – j |Mj|X/B For B large enough, noise translates to a relative error Lower Bounds For every 0 · p · 2, there is a randomized algorithm that with probability · n-100 outputs FAIL, and otherwise outputs an I in [n] for which for all j in [n] Pr[I = j] = (1 ± )|xj|p/Fp Algorithm is 1-pass, poly(-1 log n)-space and time, returns wi = (1 ± )|xj|p/Fp For p > 2, gives n1-2/ppoly(-1 log n)-space. Can we output FAIL with probability 0? Requires (n) space for any . Reduction from 2-party equality testing with no error Given that we don’t output FAIL, can we get a sampler with = 0? Yes for 2-pass algorithms, using rejection sampling. 1-pass requires (n) space if algorithm outputs the corresponding probability wi (needed in many applications). Reduction from the 2-party INDEX problem Can we use less space for p > 2? Requires (n1-2/p) space for any . Reduction from L1-estimation Can improve to (n1-2/plog n) using augmented L1-estimation [Jayram, W] Some Open Questions 1-pass algorithms for Lp-sampling If we output FAIL with probability · n-100, and don’t require outputting the sampled item’s probability, can we get = 0 with low space? and log n factors are large. What is the optimal dependence on them? Useful for Fk-estimation for k > 2, and other applications Sampling from other distributions Given a vector (x1, …, xn) in a data stream, for which functions g can we sample from the distribution ¹(i) = |g(xi)|/j |g(xj)|? E.g., random walks Thank you (https://docslide.com.br/documents/l-p-sampling.html)

DESCRIPTION L p -Sampling. David Woodruff IBM Almaden Joint work with Morteza Monemizadeh TU Dortmund. Given a stream of updates (i, a) to coordinates i of an n -dimensional vector x…

TOP RELATED

P l _153_11_c___reforma_codigo_nal_transit (https://docslide.com.br/newspolitics/p-l15311creformacodigonaltransito.html) News & Politics

assume p > 0 in talk Let h:[n] -> [n] be a hash function Create log n substreams Stream1, Stream2, …, Streamlog n

Download (https://docslide.com.br/download/link/lp-sampling)

(https://docslide.com.br/news-politics/p-l15311creformacodigonaltransito.html)

(https://docslide.com.br/category/newspolitics.html) View more (https://docslide.com.br/search?q=L+p+-Sampling)

…

SAMPLING ALGORITHMS AND C…

…

SAMPLING L EXTERNAL VALIDIT…

…

SURVEY SAMPLING L SAMPLING …

Sampling algorithms and core-sets for L p

(adsbygoogle = window.adsbygoogle || []).push({});

regression and applications. Michael W. Mahoney

Slide 1 Survey sampling l Sampling & non-sampling

Yahoo Research ( For more info, see:

error l Bias l Simple sampling methods l Sampling

http://www.cs.yale.edu/homes/mmahoney ). 14 Models,…

We built a platform for members to share

0

COMPANY

212

0

terminology l Cluster…

CONTACT & LEGAL

documents and knowledge. And we are not related to any other website

About (https://docslide.com.br/about.html) Contact (https://docslide.com.br/contacts.html)

Terms (https://docslide.com.br/info/terms.html)

212

0

OPENING HOURS Monday to Saturday 9:00am to 5:00pm Sunday: CLOSED

DMCA (https://docslide.com.br/info/dmca.html)

STARTUP - SHARE TO SUCCESS

(https://facebook.com/d (https://twitter.com (https://goo

L p -Sampling - Documents - docslide.us [PDF]

Recommend Stories

Idea Transcript

Helpful Links

Smile Life

Get in touch