Programmer help needed

Starmaker · Post by **Starmaker** » Mon Oct 26, 2015 11:59 am

2016's post here.

I'm going to be funemployed real soonish and currently searching for a new job. This is what they want a prospective employee to solve and optimize the solution:

1. Given an array of points on a surface (defined by their Cartesian coordinates -- "for the sake of simplicity, the coordinates are integer numbers" -- this is verbatim), for each point, find the point closest to it and all points within double distance to the closest point.

2. Find the earliest appearance of a given sequence of digits in the sequence 1234567891011121314151617181920212223...

I solved #2. It has fuck all to do with substring search.

What's a fast way to do #1? (The only improvement over the dumbest possible solution I can imagine is not using the square root.)

RobbyPants · Post by **RobbyPants** » Mon Oct 26, 2015 12:23 pm

Does #1 specify a number of points? If the number is large, there could be a way that involves sorting ahead of time to reduce search space later.

I had to do something kind of like #1 in my last job, but I don't know that it was the most optimal solution. I picked the solution I did because it involved analyzing a few hundred thousand points in 3D space, and you really only cared about any point relative to other points nearby.

So, the idea I picked was to take a bunch of time (and memory) up front sorting the data by it's 3D location, so I'd have a much smaller number of points to compare. I basically made a jagged 4D array, where the first indicies of the first 3 dimensions corresponded to what "cube" the point existed in, in 3D space, and the 4th dimension was a list of points in that "cube".

That being said,you might be able to do the same type of thing here with a 2D array. The index could list it's location, and you'd store a number of point in those coordinates. Alternately, it could store bools/bits just to say whether or not any points are at that location. Without the sort, it looks like you'd be doing a nested loop (O^2 complexity) to analyze the points, followed by another, shorter nested loop to sort your hits. If you were to to plop them all into a 2D array where each index corresponded to its actual position, for any given point you're checking, you would know which other points are within double distance, which would limit your search space. You would start at that "center" of that search space and look outward, breaking out of the search once you found a hit.

So, that would have O complexity for setting up the grid, and each search after that would be technically O^2, but for a much shorter list. So, this would be faster on a really long set of points, but would perform worse on a short list. If the problem doesn't specify, they probably are not looking for something like this.

DSMatticus · Post by **DSMatticus** » Mon Oct 26, 2015 3:08 pm

The first is a space partitioning problem.

If you partition a large space, you can restrict your search only to the partitions which could possibly contain solutions. If I asked you if there were any pizza places within 25 miles of my house in Ohio, you would not look at pizza places in Texas. No point in Texas is within 25 miles of any point in Ohio. Texas - and the many many pizza places therein - are eliminated from consideration in one fell swoop simply by virtue of the fact that the partition is so far away it cannot possibly contain any solutions.

The space partitioning algorithm everyone comes up with on their own is the one RobbyPants describes, and the best way to visualize it is that you're grid papering your input. Once your input is divided into these neat, even, repeating cells, you need only consider the cells which might contain solutions; collectively those cells will contain some number of points, and you will perform the usual O(N^2) search on them.

The "robust" space partitioning algorithm is recursive partitioning. Divide the space in two. If there are still too many points in either half, divide that half in two. If there are still to many points in either half, divide that half in two. If there are still okay you know what recursion is I'm sure you've grasped the basic concept. Structurally, you arrange this like a binary (or quad-, or oc-) tree whose leaves are all of the final partitions (and lists of the points contained therein). To find partitions that might contain solutions, you just traverse the tree, remembering that if the parent cannot contain a solution none of its children can.

While the first type of algorithm will work in 99% of real world cases (and is used in 99.99% real world cases, even if it's one of the .99% in which it won't), do note that the second type of algorithm adapts to peculiarities of the input automatically (while the first one can just fail spectacularly as every input lands in the same cell). This is why it's the answer that gets you an A on exams and this is why it's the answer your prospective employer probably wants to see. So I suppose my advice is google quadtrees and get comfortable with them.

schpeelah · Post by **schpeelah** » Mon Oct 26, 2015 5:53 pm

Create a second array to hold point distances from the given point. Find the minimal point, saving the distances into the array, then get all points within double distance. Complexity: 2N comparisons and N calculations.

The partitioning would be OK if you could reuse it for multiple queries against the same array of points, at which point you need to know the acceptable tradeoff between setup for the first query and the time of each subsequent query.

No fancy tricks will help if the scenario is to only do this once. If you have a reasonable bound on max distance, you could make an array of stacks to sort points into for the second pass, but when your input is an array of pairs of integers in an arbitrary order nothing can save you from from calculating distance for each of them and comparing to find the minimum - the total complexity of everything else you do has to therefore beat simply reading all the distances again and comparing to be an optimization.

This problem might just be to find people who don't actually know the basics or try to overcomplicate things. My current job's interview programming task was IIRC to just implement insert sort in Java. Those things tend to be basic.

RobbyPants · Post by **RobbyPants** » Mon Oct 26, 2015 7:01 pm

schpeelah wrote:Create a second array to hold point distances from the given point. Find the minimal point, saving the distances into the array, then get all points within double distance. Complexity: 2N comparisons and N calculations.

That doesn't make sense. Distance is relative from two points.

The problem explicitly says to calculate it per point.

for each point, find the point closest to it and all points within double distance to the closest point.

So, you're approach is 2N * N, where you're guaranteeing that each point must be compared to all others. Partitioning limits the search size.

Ancient History · Post by **Ancient History** » Mon Oct 26, 2015 7:23 pm

schpeelah is thinking in terms of vectors and vector addition, I think.

schpeelah · Post by **schpeelah** » Mon Oct 26, 2015 8:10 pm

Ancient History wrote:schpeelah is thinking in terms of vectors and vector addition, I think.

No, I just missed the part about it being for each point rather than just a single chosen point.

Longes · Post by **Longes** » Tue Oct 27, 2015 9:24 am

This should help.

Starmaker · Post by **Starmaker** » Thu Nov 05, 2015 11:37 am

edit:

So turns out the in-person test is a GROUP task. Well fuck me sideways.
Does anyone have any pointers to shitty advice for HR managers on how to conduct those? Like "watch interviewees carefully, whoever scratches his or her nose when talking was born when Mars was ascendant, hire them"?

---------------
Thank you, everyone!

I've been invited for an in-person interview. I have no idea how they conduct those -- there'll be a test, but what it will entail and what sources will be available is anyone's guess. Google fails me; where it didn't fail me was to reveal they hire, like, 1.5 people per year. So my chances are not good.

But at least I understand recursion now.
(I'm never posting another Xzibit meme.)

Lokey · Post by **Lokey** » Thu Nov 05, 2015 11:15 pm

They always vary, it's whatever the HR troll you land is looking for. Try company work philosophy webstuff especially stuff that's repeated a lot, section headers or figure annotations (probably in reverse order since HR trolls are lazy).

Do not do real work for free. Do not fling feces. Do not authoritatively answer questions you aren't sure about. After that, it's a crapshoot anywhere these days

TarkisFlux · Post by **TarkisFlux** » Fri Nov 06, 2015 4:29 pm

I'm curious what your solution for #2 looked like Star.

Starmaker · Post by **Starmaker** » Fri Nov 06, 2015 7:04 pm

TarkisFlux wrote:I'm curious what your solution for #2 looked like Star.

https://github.com/Zephyrine/hh/blob/master/linotype.py

Starmaker · Post by **Starmaker** » Fri Sep 30, 2016 6:26 pm

New year and a new attempt! (Also, the chance of funemployment is now higher than ever: they forced me to take a fucking literacy test written by an illiterate teambuilding coach, which I failed.)

This is Stage 1, which only requires submitting a web form with numerical answers, no source code, no time constraints, and I already sent in my answers (kept clearing the cookie until it gave me an easy problem set). So this is for my personal edification only.

Let f(n) be the smallest positive integer m such that m! (note the bang) is divisible by positive integer n (e.g. f(5) = 5, f(6) = 3, f(10) = 5).

Find the sum of f(n) for a large number of large consecutive integers (say, 990'000'000 to 1'000'000'000).

What I did:

- factorized each number, e.g. 1'000'000'000 = 2^9 * 5^9

- for every prime factor p of every number, I found a multiple of said factor m(p) which together with its preceding multiples contains just enough total factors to make up p's power:

Code: Select all

2 * 4 * 6 * 8 * 10 * 12 divides 2^9
1 + 2 + 1 + 3 + 1  + 2 = 10 &#40;greater than 9&#41;

5 * 10 * 15 * 20 * 25 * 30 * 35 * 40 divides 5^9
1 + 1  + 1  + 1  + 2  + 1  + 1  + 1  = 9

Then the maximum of m(p) (in this case, the maximum of 12 and 40, which is 40) is the f(n) that we want. Sum them up.

However, factorization is still slooooooow, so the code runs for several hours. Brute force runs for a month. For comparison, the second-slowest problem takes 80 seconds.

Is there a faster way to do this?

schpeelah · Post by **schpeelah** » Fri Sep 30, 2016 7:17 pm

How did you factorize? My first instinct would be a modified sieve of Eratosthenes which should be n^2, not all that slow everything considered.

In fact, let's ditch factorization. Compute consecutive m!, then add m to the answer for every multiple of m! not already marked in the giant bool array.

Pixels · Post by **Pixels** » Fri Sep 30, 2016 8:08 pm

Yeah, you can attack from the factoral side with something like this:

Code: Select all

public int calculate&#40;int n&#41;
&#123;
	long fact = 1;
	for &#40;int m = 2; m < n; m++&#41;
	&#123;
		fact = &#40;fact * m&#41; % n;
		if &#40;fact == 0&#41;
			return m;
	&#125;
	return n;
&#125;

However, it won't work well for primes or semi-primes. f(n)=n if n is prime, and f(n)=p if n=p*q for p, q prime with p>q. It's not too much effort to filter out primes with Fermat though, and your input numbers are small enough that it might not matter anyhow.

schpeelah · Post by **schpeelah** » Fri Sep 30, 2016 9:03 pm

Pixels, isn't that the brute force?

Anyway, final idea (maybe). You want to calculate f(n) for some large number of ns < N.

Code: Select all

int arr&#91;N&#93;; // presumed initialized to 0
int M = 1; // holds m!
for&#40;int m = 2;M<N;m++&#41;&#123;
    M = M * m;
    for&#40;int i = M; i<N; i=i+M&#41; &#123;
        if&#40;arr&#91;i&#93; == 0&#41;&#123;
            arr&#91;i&#93; = m;
        &#125;
    &#125;
&#125;

Should be something like (log(n))^2 or better, I'm not sure how to denote an inverse of !. Except of course initializing the array and computing the final sum are linear. Seems good enough that I think it's a serious competitor even if you don't need the first 990'000'000 numbers.

Pixels · Post by **Pixels** » Fri Sep 30, 2016 9:53 pm

You... really have not thought this through very thoroughly. Or tested it. Your algorithm (as written) will literally just fill the entire array with alternating 0's and 2's. f(n)=m means that n should divide m!, not the other way around.

Off the top of my head, I can't think of a good sieve-related solution. I'm sure there's a very elegant solution out there, but I'm far too rusty on my number theory for it to rise to mind.

With a primality test, some fiddling to deal with semi-primes of the form 2*p or 3*p, and a bunch of threading I can get the expected computation time (on my work box) on my solution down to about 15 hours. Not good enough that I'd want to ship it to customers, plenty good enough for a one-off calculation with no time constraints.

Starmaker · Post by **Starmaker** » Tue Oct 04, 2016 8:56 am

Round 2 has started, and there are two problems, as before.

Problem 1 is the rain trap in 3d.
Problem 2 is the good old "earliest appearance of sequence in string".

This time we're expected to use stdin-stdout, so submission checking is probably semi-automated -- great news. I was told Problem 2 was a hit last year meaning hardly anyone among the Round 3 in-person interviewees solved it, and if the trend holds, the checker will fail shitty solutions, there'll be fewer Round 3 contenders and I'll get an easy rejection because I'm so old I remember Raistlin. Still, it's more fun than literacy tests.

Blasted · Post by **Blasted** » Wed Oct 05, 2016 12:01 am

The thought of problem 2 being an issue is a bit jarring. Isn't it a standard algorithm? Maybe I'm too old and crusty for this stuff.

Ice9 · Post by **Ice9** » Wed Oct 05, 2016 3:11 am

I think he's referring to a sequence with the same characteristics as in the OP - one composed of ordered integers starting with 1.

Which can be done in log10(pattern) time if you just want a valid index. I'm not sure off hand what it would be if you wanted to find the earliest index - ie. "213" is at index 14.

Starmaker · Post by **Starmaker** » Wed Oct 05, 2016 9:46 am

Ice9 wrote:I think he's referring to a sequence with the same characteristics as in the OP - one composed of ordered integers starting with 1.

Which can be done in log10(pattern) time if you just want a valid index. I'm not sure off hand what it would be if you wanted to find the earliest index - ie. "213" is at index 14.

Basically, you look at the number and see if you can detect a pattern. I'll put my code back up in two weeks when the submission window closes. I googled the rain trap, after all.

Starmaker · Post by **Starmaker** » Wed Nov 02, 2016 9:43 am

"We regret to inform you we cannot extend an invitation to the School to you, as you possess sufficient knowledge to apply for a junior developer vacancy without additional education."

wat
(edit: I phoned their HR person and they're apparently moderately enthusiastic about "keeping me in mind" for when they have openings and/or recommending me to their partners, in stark contrast to last year's "meh gtfo". Time to dust off the cv.)

---
Sequence:
https://github.com/Zephyrine/hh16/blob/master/seq.py

Explanation:

1. Find the first natural number of the sequence which actually appears in the target pattern by generating sequence fragments of increasing "remoteness" and trying to match them to the pattern.
2. Calculate how many digits preceded it in the natural sequence.
3. Shift as needed.

To do 1,
for each possible length k of the target number from 1 to length_of_pattern
for each possible shift j of the target number from 0 to k-1
take the digits in positions [j..k+j) of the pattern, assemble a segment of the natural sequence (adding and possibly subtracting 1s as needed) induced by that number and see if it matches the pattern.
In the inner cycle, if a mismatch is found, stop this iteration and try another j; if a match is found, record the inducing number and its shift j as a possible answer and return the smallest of possible answers when the inner cycle ends
(as soon as we have a candidate, we don't need to check for possible candidates of greater lengths since they're obviously further down the natural sequence).

Example:

91029112
k = 1
j = 0: 9 + 1 = 10 == 10, 10 + 1 = 11 != 29
k = 2
j = 0: 91 + 1 = 92 != 02
j = 1: 10 - 1 = 9 == 9, 10 + 1 = 11 != 29
k = 3
j = 0: 910 + 1 = 911 != 291
j = 1: 102 - 1 = 101 != ??9
j = 2: 029 is not a natural number
k = 4
j = 0: 9102 + 1 = 9103 != 9112
j = 1: 1029 - 1 = 1028 != ???9
j = 2: 0291 is not a natural number
j = 3: 2911-1 = 2910 == ?910, 2911+1 = 2912 == 2???, pattern ends, record candidate;
we're out of js for this k and we're done, return (2911, 3).

Count the digits preceding the natural number 2911 in the sequence: 9*1+90*2+900*3+(2911-1000)*4
subtract 3 for the shift,
add 1 (per problem statement, we need the position of the first digit of the pattern)
and stick a fork in it.

Rain trap:
(provided for public benefit, since half the suggested solutions on the interwebs for the 2d trap are wrong)
https://github.com/Zephyrine/hh16/blob/master/flood.py

Explanation:

If either dimension is smaller than 3, the island won't hold water at all, return 0 (the general algorithm works in this case, too, but the check is cheaper).
Set total volume to 0.
Add the border cells to a heap (sorted by height), mark as visited.
Take a cell off the heap (once we're out of heap, we're done, return total volume).
For each its unvisited neighbor, add max(0, current height - neighbor height) to total volume, mark it as visited and add to the heap with height set to max(current height, neighbor height) [greater of its height and water level; water that we know isn't going anywhere effectively solidifies].

(If an HR person looking for a programmer is miraculously reading this, go ahead and PM me. I can sleep on a windowsill as long as I don't have to take literacy tests.)